Cleansing Data for Mining and Warehousing
نویسندگان
چکیده
Given the rapid growth of data it is important to extract mine and discover useful information from databases and data ware houses The process of data cleansing is crucial because of the garbage in garbage out principle Dirty data les are prevalent because of incorrect or missing data values inconsistent value naming conventions and incomplete information Hence we may have multiple records refer ing to the same real world entity In this paper we examine the problem of detecting and removing duplicating records We present several e cient techniques to pre process the records before sorting them so that potentially matching records will be brought to a close neighbourhood Based on these techniques we implement a data cleansing system which can detect and remove more duplicate records than existing methods
منابع مشابه
Infrastructure, Data Cleansing and Mining for Support of Scientific Simulations
by Yingping Huang We propose a multi-tier infrastructure which demostrates the successful integration of web servers, application servers, databases, data analysis and reports, data cleansing, data warehousing, data mining, and the Swarm/RePast simulation models. The goal of the system is to support scientific simulations in the fields of environmental and social science using advanced features...
متن کاملData Warehouse Back-End Tools
The back-end tools of a data warehouse are pieces of software responsible for the extraction of data from several sources, their cleansing, customization, and insertion into a data warehouse. They are known under the general term extraction, transformation and loading (ETL) tools. In all the phases of an ETL process (extraction and exportation, transformation and cleaning, and loading), individ...
متن کاملEncyclopedia of Data Warehousing and Mining, Second Edition (4 Volumes)
The Encyclopedia of Data Warehousing and Mining, Second Edition offers thorough exposure to the issues of importance in the rapidly changing field of data warehousing and mining. This essential reference source informs decision makers, problem solvers, and data mining specialists in business, academia, government, and other settings with over 300 entries on theories, methodologies, functionalit...
متن کاملChapter.i, " Combining Data Warehousing and Data Mining Techniques for Web Log Analysis "
In enterprises, a large volume of data has been collected and stored in data warehouses. Advances in data gathering, storage, and distribution have created a need for integrating data warehousing and data mining techniques. Mining data warehouses raises unique issues and requires special attention. Data warehousing and data mining are interrelated , and require holistic techniques from the two ...
متن کاملModel Free Data Mining
This is the second volume of the Advances in Data Warehousing and Mining (ADWM) book series. ADWM publishes books in the areas of data warehousing and mining. The topic of this volume is data mining and knowledge discovery. This volume consists of 14 chapters in four section, contributed by authors and editorial board members from the International Journal of Data Warehousing and Mining, as wel...
متن کاملOn Supporting the Data Warehouse Design by Data Mining Techniques1
Integration of data warehousing and data mining by applying the latter as a front end technology to the former is a known approach. Nevertheless, the data warehouse design process can also be seen as an area of application for data mining techniques. In this paper, we compile the requirements of the data warehouse design process concerning data source analysis, structural integration of data so...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999